Introduction

Row

Background

As electronic health records (EHR) and biobanks have grown in popularity, so has the amount of data available to discover relationships between a patient’s genotype and phenotype.

Individual-Level data

EHRs contain vast quantities of information for individual patients. One useful set of information is ICD9 and ICD10 codes. These are used to keep track of billable actions. The PheCode is a mapping of ICD codes to phenotype (or patient conditions) for research purposes. E.g. Phecode 360.2: Progressive Myopia. Additionally, biobanks provide data on a patient’s biomarkers such as genetic mutations or their genotype.

PheWAS

Developed at Vanderbilt, a popular method for linking genotype’s with phenotypes is the Phenome-Wide Association Study (PheWAS). PheWAS uses EHR data to produce a list of phenotypes significantly associated with a pre-specified genotype.

App usage

What problems are solved?

Interacting with results

PheWAS results are typically delivered with static plots and tables. ME allows researchers to instantly explore the results, digging in and looking for driving patterns.

Expanding past plain associations

PheWAS results look at a genotype’s association with a given phenotype one phenotype at a time. By giving researcher’s the ability to look at the network behavior of genotype-phenotype associations, ME allows for more nuanced insights from data than a single P-Value can provide.

Current deployments

  • Exploring more nuanced Heart Failure phenotypes.
  • Drug repurposing
  • Rare-disease detection

Interactivity driven by R2D3

All plots are custom-built interactive javascript visualizations made with the help of the package r2d3.

Row

Shiny modules

Due to the need to build custom versions of ME with different visualization types a helper package meToolkit() with shiny modules was built. Standardized input and output allow easy swapping and testing of app components.

Technologies & packages used

Development

All coding done in RStudio Server Pro hosted on AWS EC2. The app is a Shiny app in dashboard format thanks to Shinydashboard. Custom interactive plots are all build with d3.js and called from R using R2D3.

Data Management

Data for the app managed using Hive running on AWS Athena. The larger-than-memory datasets stored in Apache Parquet files for efficient queries.

Deployment

Completed apps are most frequently hosted on RStudio Connect server. Occasionally, apps are run locally using Docker containers for speed and security reasons.

About

Me (Nick Strayer):

  • 4th year PhD candidate in Biostatistics at Vanderbilt University
  • Previously at New York Times and Johns Hopkins Data Science Lab
  • nickstrayer.me, @nicholasstrayer, nstrayer, livefreeordichotomize.com

TBILab (Translational Bioinformatics Lab)

  • Focused on using modern data mining and machine learning techniques to help unveal valuable clinical information in messy EHR and biobank data.
  • PI: Yaomin Xu, Professor Biostatistics and Bioinformatics.

Software

Row

left

This research and application are made possible due to the following support:

CTSA award No. UL1 TR002243 from the National Center for Advancing Translational Sciences.

mid

Many thanks to those that helped support this research:

Quinn Wells, Pharm.D., M.D. | Michael R. Savona, M.D.
Joshua C. Denny, MD, MS | Vanderbilt Drug Repurposing program

farright

Demo

Row

Chart 1

Row

Chart 2

Chart 3